Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • composing a local macro containing the string "bfpm_l`iu'_1420"

    Hello,

    I'm writing syntax to replace all variable names of the last panel wave with the variable names of the actual panel wave in a large dofile.

    I have a stata file holding 'oldvar' and 'newvar'. The text of the dofile has been transferred into a strL-variable with:

    Code:
    generate strL odofile = fileread(`"`pfad1s'/`synold'"')
    replace odofile = "" if _n > 1
    If there were only "normal" variable names to be replaced, it was an easy job, that can be done with

    Code:
    replace odofile = `u'subinstr(odofile, `"`varold'"', `"`varnew'"', .) if _n == 1
    while looping over all varold/varnew-pairs.

    But the variable names contain running indexes,which in the dofile-text are represented by a local macro that changes its value according to the loop index.
    Here is an excerpt from the dofile that I want to update to clarify this:

    Code:
    local n = 15
    forvalues i = 2(1)`n' {
        /* Generation of the variable parts of the variable names */
        local j = `i'-1
        if `i' < 10 {
            local in 0`i'
        }
        if `i' >= 10 {
            local in `i'
        }
        if `j' < 10 {
            local iu 0`j'
        }
        if `j' >= 10 {
            local iu `j'
        }
        
        use "filename", clear
            
        keep hhnr persnr welle bfpm_monin bfpm_l`iu'_1801 bfpm_l`iu'_1802 bfpm_l`iu'_19 bfpm_l`iu'_20 bfpm_l`iu'_21 ///
            bfpm_l`iu'_22
    The problem I'm not able to solve is how to construct a macro that contains exactly (for instance) the string "bfpm_l`iu'_1802"
    The oldvar/newvar table represents the placeholders within the variable names by "XX".
    So I have to construct a string holding the part of the variable name before `iu', plus `iu' plus the part after ìu'

    The part before `iu' is captured by:
    Code:
    local ostr1 = `u'substr(`"`varold'"',1,6)
    The part after `iu' is captured by:
    Code:
    local ostr2 = `u'substr(`"`varold'"',9,`u'strlen(`"`varold'"')-8
    But how to capture "`iu'" and construct `varold' by combining `ostr1' `placeholder'`ostr2 ???
    I've tried all versions I can think of, including compound quotes and escape sign \ but nothing seemed to work.

    I hope very much someone can help

    Thanks, Klaudia
    Last edited by Klaudia Erhardt; 09 May 2017, 05:04.

  • #2
    Hi Klaudia,

    could you please clarify this a little, I'm not sure I understood your source material:

    The dataset holding the renaming information looks like this:
    Code:
    clear
    input oldname newname
    "AB" "EF"
    "CD" "GH"
    end
    Correct? And you want to replace all occurrences of "AB" in the do-file in question by "EF", and "CD" should be replaced by "GH". Is this what you're trying to achieve?

    Regards
    Bela
    Last edited by Daniel Bela; 09 May 2017, 06:21. Reason: typo

    Comment


    • #3
      Hi Daniel,

      in principle you describe correctly what I try to achieve. Only that the strings are more complicated, they look like:

      varname_bfp varname_bgp
      bfpm_l14_3701 bgpm_l14_37
      bfpm_l15_3701 bgpm_l15_37
      bfpm_lXX_3701 bgpm_lXX_37
      bfpm_l01_3803 bgpm_l01_3802
      bfpm_l02_3803 bgpm_l02_3802


      The varname pair bfpm_lXX_3701 / bgpm_lXX_37 stands for the following occurences in the do-file:
      bfpm_l`iu'_3701 or bfpm_l`in'_3701 or bfpm_l??_3701 that have to be replaced by:
      bgpm_l`iu'_37 or bgpm_l`in'_37 or bgpm_l??_37

      As there are many variable names, I want to loop over varname_bfp and transfer each varname_bfp and varname_bgp into 2 local macros that are needed to construct the macros `varold' and `varnew' in the replace-command (see post #1)

      What might be confusing in my post #1 is that `varold' and `varnew' change their contents: in the first step they contain the original entries of varname_bfp and varname_bgp, which is

      bfpm_lXX_3701 and bgpm_lXX_37 , for instance.

      In the second step they contain successively:

      bfpm_l`iu'_3701 and bgpm_l`iu'_37
      bfpm_l`in'_3701 and bgpm_l`in'_37
      bfpm_l??_3701 and bgpm_l??_37

      For 'simple' variables it would be:
      bfpm_l15_3701 and bgpm_l15_37 in both steps which was easy to solve.

      What I was not able to achieve is the second step for the varable names containing the place holders.

      Do you have a solution to this?

      Greetings, Klaudia

      Comment


      • #4
        Luckily, I found a feasible solution to my problem:

        I succeeded to replace all occurences of `iu' and `in' by *iu* and *in*
        Thereafter it was possible to construct old and new varnames which had place holders.
        It worked as follows:

        Code:
        use `"`pfad1d'/`konk'"', clear    /* Stata file with correspondence of old and new varnames */
        generate strL odofile = fileread(`"`pfad1s'/`synold'"')  /* reads do-file into strL variable */
        replace odofile = "" if _n > 1   /* because text is needed only once */
        
        /* replace the unmanageable 'iu' and `in' strings with managable strings: */
        replace odofile = `u'subinstr(odofile, "\`iu'", "*iu*", .) if _n == 1
        replace odofile = `u'subinstr(odofile, "\`in'", "*in*", .) if _n == 1
        follow the transformations of odfile[1], and thereafter the replacement is reversed:

        Code:
        replace odofile = `u'subinstr(odofile, "*iu*", "\`iu'", .) if _n == 1
        replace odofile = `u'subinstr(odofile,  "*in*", "\`in'", .) if _n == 1
        It took my quite some hours, but I'm happy having found a solution!

        Comment


        • #5
          Hi Klaudia,

          glad to hear you found a solution to the problem; I guess, the short answer to the original questions would have been: Have a try with [1] escaping local macro delimiters with backslashes (which you did) and [2] avoid expansion of macros beyond the first level of expansion by using `macval(macroname)' instead of `macroname' (which you missed).

          I would not recommend modifying the source material's "unmanageable" strings by brute force, as you did. This is not needed, and it may introduce modifications in places where you do not want them by doing so (it would, for instance, change the comment in line 3 of my example do-file below, which would be undesired).

          The long answer follows below for future reference:
          It took me a while to reconstruct the problem, a complete minimal example would have been helpful to speed this up; here it is:
          Code:
          // create example data containing rename pairs
          clear
          timer clear
          input str20(oldname newname)
          "bfpm_l14_3701" "bgpm_l14_37"
          "bfpm_l15_3701" "bgpm_l15_37"
          "bfpm_lXX_3701" "bgpm_lXX_37"
          "bfpm_l01_3803" "bgpm_l01_3802"
          "bfpm_l02_3803" "bgpm_l02_3802"
          end
          
          // create tempnames
          tempname sourcefile targetfile
          local placeholders `""??" "\`in'" "\`iu'""'
          
          // create example do-file to replace contents in
          file open `sourcefile' using "test-dofile.do" , replace text write
          file write `sourcefile' `"bfpm_l14_3701"' _newline
          file write `sourcefile' `"bfpm_l15_3701"' _newline
          file write `sourcefile' `"* in the following loop, \`iu' will stand for -something-"' _newline
          file write `sourcefile' `"bfpm_l\`iu'_3701"' _newline
          file write `sourcefile' `"bfpm_l\`in'_3701"' _newline
          file write `sourcefile' `"bfpm_l??_3701"' _newline
          file close `sourcefile'
          The traditional way (i.e. this works before Stata 14) to solve this would have been a direct file access to replace contents of the do-file line-by-line. This can be done using -file read- and -file write- statements, as you probalbly know:
          Code:
          // open files, source file for reading, target file for writing
          local placeholders `""??" "\`in'" "\`iu'""'
          tempname sourcefile targetfile
          file open `sourcefile' using "test-dofile.do" , read text
          file open `targetfile' using "result-dofile.do" , write text replace
          * read a single line from source file
          file read `sourcefile' line
          while r(eof)==0 {
              * loop over all rename-pairs, replace each occurrence in the current line
              local newline : copy local line
              forvalues num=1/`c(N)' {
                  local oldname=oldname[`num']
                  local newname=newname[`num']
                  if (!strmatch(`"`macval(oldname)'"',"*XX*")) {
                      local newline : subinstr local newline `"`oldname'"' `"`newname'"' , all
                  }
                  else {
                      foreach placeholder of local placeholders {
                          local oldphname : subinstr local oldname `"XX"' `"`macval(placeholder)'"' , all
                          local newphname : subinstr local newname `"XX"' `"`macval(placeholder)'"' , all
                          local newline : subinstr local newline `"`macval(oldphname)'"' `"`macval(newphname)'"' , all
                      }
                  }
              }
              * write the modified line to the target file, and re-read the next source file line
              file write `targetfile' `"`macval(newline)'"' _newline
              file read `sourcefile' line
          }
          file close `sourcefile'
          file close `targetfile'
          This, however, involves two file calls to the file system per do-file line (one reading the original, one writing the potentially modified one). This is necessarily slow. The solution you came up with (reading the source file into a strL variable, modifying it in memory, and writing it back to the file system) is possibly much faster, depending on the actual size of the do-file. Content replacement, however, can be done in a very similar way:
          Code:
          local placeholders `""??" "\`in'" "\`iu'""'
          generate strL sourcedofile=fileread(`"test-dofile.do"') in 1
          forvalues num=1/`c(N)' {
              local oldname=oldname[`num']
              local newname=newname[`num']
              if (!strmatch(`"`macval(oldname)'"',"*XX*")) {
                  replace sourcedofile=usubinstr(sourcedofile,`"`macval(oldname)'"',`"`macval(newname)'"',.) in 1
              }
              else {
                  foreach placeholder of local placeholders {
                      local oldphname : subinstr local oldname `"XX"' `"`macval(placeholder)'"' , all
                      local newphname : subinstr local newname `"XX"' `"`macval(placeholder)'"' , all
                      replace sourcedofile=usubinstr(sourcedofile,`"`macval(oldphname)'"',`"`macval(newphname)'"',.) in 1
                  }
              }
          }
          generate targetdofile=filewrite(`"result-dofile.do"',sourcedofile[1],1)

          Regards
          Bela

          Comment


          • #6
            Hi Bela,

            just a short answer, because I'm on my way home...
            Thank you for the hint to the macval()-function, which I did not have in mind yet. And maybe some more ways to optimize my code, I'll look at it tomorrow.

            Two remarks though:
            - You are not right concerning my code may have unexpected effects (like changing the comment), because at the end I reverse the replacement, and *iu* etc. becomes `iu' etc. again.
            - I considered going over the original code line by line and decided otherwise, because it would have implied about 600 loops over the variable correspondence list instead of 1.

            Thank you anyway for investing quite a bit of time. I'll go over your last syntax proposal tomorrow to see if there are optimizations of my code beyond different programming styles.

            Greetings, Klaudia

            Comment


            • #7
              Hi Bela,
              I just tested your syntax (last code part in your post #5). It replaced the old varnames only when no placeholder were involved. Your code did not generate errors. I had to replace the filenames and paths, for sure I used your code as follows, when the variable correspondence table was the active file:

              Code:
              /* Belas Code ################################################ */
              
              *set trace on
              *set tracedepth 1
              
              local placeholders `""??" "\`in'" "\`iu'""'
              *generate strL sourcedofile=fileread(`"test-dofile.do"') in 1
              generate strL sourcedofile = fileread(`"`pfad1s'/`synold'"') in 1
              forvalues num=1/`c(N)' {
                  local oldname=`vold'[`num']
                  local newname=`vnew'[`num']
                  if (!strmatch(`"`macval(oldname)'"',"XX")) {
                      replace sourcedofile=subinstr(sourcedofile,`"`macval(oldname)'"',`"`macval(newname)'"',.) in 1
                  }
                  else {
                      foreach placeholder of local placeholders {
                          local oldphname : subinstr local oldname `"XX"' `"`macval(placeholder)'"' , all
                          local newphname : subinstr local newname `"XX"' `"`macval(placeholder)'"' , all
              display '"`macval(oldphname)', `macval(newphname)'"'
                          replace sourcedofile=subinstr(sourcedofile,`"`macval(oldphname)'"',`"`macval(newphname)'"',.) in 1
                      }
                  }
              }
              
              generate len = filewrite(`"`pfad1s'/`synew'"', sourcedofile[1], 1)
              
              /* Ende Belas Code ################################################ */

              The relevant part of my code looks as follows:

              Code:
              generate strL odofile = fileread(`"`pfad1s'/`synold'"')
              
              quietly    {
                  replace odofile = "" if _n > 1
                  replace odofile = `u'subinstr(odofile, "\`iu'", "*iu*", .) if _n == 1
                  replace odofile = `u'subinstr(odofile, "\`in'", "*in*", .) if _n == 1
              }
              local n=_N
              forvalues i = 1(1)`n'    {
                  local varold =  `u'strtrim(`vold'[`i'])
                  local varnew = `u'strtrim(`vnew'[`i'])
                  if "`varnew'" != "" & `"`varold'"' != `"`varnew'"'    {    
              
                      /* die Varnamen mit Platzhaltern zusammenbauen */
                      if `u'strpos(`"`varold'"', "XX") > 0 & `u'strpos(`"`varnew'"', "XX") > 0 {
                          local ostr1 = `u'substr(`"`varold'"',1,6)
                          local nstr1 = `u'substr(`"`varnew'"',1,6)
                          local ostr2 = `u'substr(`"`varold'"',9,`u'strlen(`"`varold'"')-8)
                          local nstr2 = `u'substr(`"`varnew'"',9,`u'strlen(`"`varnew'"')-8)
              
                          local varold1 "`ostr1'`ph1'`ostr2'"
                          local varold2 "`ostr1'`ph2'`ostr2'"
                          local varold3 "`ostr1'`ph3'`ostr2'"
                          
                          local varnew1 "`nstr1'`ph1'`nstr2'"
                          local varnew2 "`nstr1'`ph2'`nstr2'"
                          local varnew3 "`nstr1'`ph3'`nstr2'"
                          
                          quietly    {
                              replace odofile = `u'subinstr(odofile, `"`varold1'"', `"`varnew1'"', .) if _n == 1
                              replace odofile = `u'subinstr(odofile, `"`varold2'"', `"`varnew2'"', .) if _n == 1
                              replace odofile = `u'subinstr(odofile, `"`varold3'"', `"`varnew3'"', .) if _n == 1
                          }
                      }
                      if `u'strpos(`"`varold'"', "XX") == 0 & `u'strpos(`"`varnew'"', "XX") == 0 {    
                          quietly replace odofile = `u'subinstr(odofile, `"`varold'"', `"`varnew'"', .) if _n == 1
                      }
                      
                  }
              }
              
              quietly    {
                  replace odofile = `u'subinstr(odofile, "*iu*", "\`iu'", .) if _n == 1
                  replace odofile = `u'subinstr(odofile,  "*in*", "\`in'", .) if _n == 1
              }
              Here is a screenshot which shows an excerpt of the comparison of the results of your and my syntax version (see below). You can see that in your syntax version varnames without place holders have been replaced ok, but not the varnames with place holders.
              In the result of my syntax version there are still "bfpm"-variables (eg line 443), but this because they are not yet included in the correspondence table.

              If you like to develop your syntax version further so as it works ok, send me an email, and I will send you the source files. Anyway, thanks a lot for dealing with my problem.

              Greetings, Klaudia


              Here is the screenshot of the results-comparison:
              Click image for larger version

Name:	image001a.png
Views:	1
Size:	432.7 KB
ID:	1391933







              Comment


              • #8
                Oh sorry, Daniel, I always adressed you as "Bela". This was because our neighbours' son is named Bela, so in my mind Bela is a first name. LG, Klaudia

                Comment

                Working...
                X